[튜토리얼3] pandas.DataFrame 불러오기

이번 튜토리얼에서는 pandas dataframe들을 어떻게 tf.data.Dataset로 불러오고 학습시키는 지에 대해 살펴보겠습니다.

이 튜토리얼은 Cleveland 심장병 클리닉 재단이 제공하는 작은 데이터셋를 사용합니다.

사용하는 데이터 안에는 수백 개의 행이 있는데, 각 행은 환자에 대한 정보를 나타내고, 각 열은 그에 대한 속성을 나타냅니다.

우리는 이 정보를 사용하여 환자가 심장병에 걸렸는지 여부를 예측할 것입니다.

이때 이 문제는 병에 걸렸는지, 걸리지 않았는지를 예측하기 때문에 이진 분류에 해당합니다.

import warnings
warnings.simplefilter('ignore')

import pandas as pd
import tensorflow as tf

def get_compiled_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu'),
        tf.keras.layers.Dense(10, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy'])
    return model

model = get_compiled_model()
model.fit(train_dataset, epochs=15)

4. 피쳐(Feature) 열에 대한 대안 방법

모델 입력으로 딕셔너리(dictionary)를 전달하는 것은 tf.keras.layers.Input 레이어에 대한 딕셔너리(dictionary)를 만드는 것만큼이나 쉽습니다. functional api를 이용하면 그 어떤 전처리 과정도 적용할 수 있고 이를 레이어로 쌓을 수도 있습니다.

이는 피쳐 열 대신 사용할 수 있을 것입니다.

inputs = {key: tf.keras.layers.Input(shape=(), name=key) for key in df.keys()}
x = tf.stack(list(inputs.values()), axis=-1)

x = tf.keras.layers.Dense(10, activation='relu')(x)
output = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model_func = tf.keras.Model(inputs=inputs, outputs=output)

model_func.compile(optimizer='adam',
                   loss='binary_crossentropy',
                   metrics=['accuracy'])

tf.data를 사용할 때 pd.DataFrame의 열의 형태를 보존하는 가장 쉬운 방법은 pd.DataFrame을 dictionary 형식으로 변환하고, 변환한 딕셔너리(dictionary)을 슬라이싱(Slice)하여 사용하는 것입니다.

dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)

for dict_slice in dict_slices.take(1):
    print (dict_slice)

model_func.fit(dict_slices, epochs=15)

Licensed under the Apache License, Version 2.0 (the “License”);

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[튜토리얼3] pandas.DataFrame 불러오기

목차

1. 판다스(pandas)를 사용하여 데이터 읽기

2. `tf.data.Dataset`을 이용하여 데이터 불러오기

3. 모델을 생성하고 학습시키기

4. 피쳐(Feature) 열에 대한 대안 방법

Copyright 2019 The TensorFlow Authors.

[튜토리얼3] pandas.DataFrame 불러오기

목차

1. 판다스(pandas)를 사용하여 데이터 읽기

2. tf.data.Dataset을 이용하여 데이터 불러오기

3. 모델을 생성하고 학습시키기

4. 피쳐(Feature) 열에 대한 대안 방법

Copyright 2019 The TensorFlow Authors.

2. `tf.data.Dataset`을 이용하여 데이터 불러오기